Simultaneous Reduction of Number of Spots and Energy Layers in Intensity Modulated Proton Therapy for Rapid Spot Scanning Delivery

Reducing proton treatment time improves patient comfort and decreases the risk of error from intra-fractional motion, but must be balanced against clinical goals and treatment plan quality. We formulated the proton treatment planning problem as a convex optimization problem with a cost function consisting of a dosimetric plan quality term plus a weighted $l_1$ regularization term. We iteratively solved this problem and adaptively updated the regularization weights to promote the sparsity of both the spots and energy layers. The proposed algorithm was tested on four head-and-neck cancer patients, and its performance was compared with existing standard $l_1$ and group $l_2$ regularization methods. We also compared the effectiveness of the three methods ($l_1$, group $l_2$, and reweighted $l_1$) at improving plan delivery efficiency without compromising dosimetric plan quality by constructing each of their Pareto surfaces charting the trade-off between plan delivery and plan quality. The reweighted $l_1$ regularization method reduced the number of spots and energy layers by an average over all patients of 40% and 35%, respectively, with an insignificant cost to dosimetric plan quality. From the Pareto surfaces, it is clear that reweighted $l_1$ provided a better trade-off between plan delivery efficiency and dosimetric plan quality than standard $l_1$ or group $l_2$ regularization, requiring the lowest cost to quality to achieve any given level of delivery efficiency. In summary, reweighted $l_1$ regularization is a powerful method for simultaneously promoting the sparsity of spots and energy layers at a small cost to dosimetric plan quality. This sparsity reduces the time required for spot scanning and energy layer switching, thereby improving the delivery efficiency of proton plans.


Introduction
Intensity modulated proton therapy (IMPT) is typically delivered via the pencil beam scanning technique.The patient is irradiated by a sequence of proton spots, arranged laterally to cover the treatment volume, where the depth of penetration of each spot is determined by its energy layer.During IMPT, protons are transmitted spot-by-spot within every energy layer, and layer-by-layer within every beam over a set of fixed-angle beams.The total plan delivery time is roughly equal to the total switching time between beams (gantry rotation plus beam setup time), switching time between energy layers, travel time between spots, and dose delivery time at each spot [GCM + 20, ZSL + 22].
In this study, we seek to reduce IMPT delivery time by reducing the number of proton spots and energy layers.A shorter treatment time is desirable because it improves patient comfort, increases patient throughput (hence lowering treatment costs), and decreases the risk of error due to intra-fractional motion [LZZ15, SPS + 16, MCN + 20].Simultaneously, we want to ensure clinical goals are met and the quality of the treatment plan remains uncompromised.This trade-off between delivery time and plan quality has been the subject of abundant research.
One thread of research focuses on greedy algorithms for energy layer assignment.These algorithms combine a variety of techniques for control point sampling, energy layer distribution, energy layer filtration, and spot optimization [DLZ + 16].The goal is to reduce energy layer switching time by pruning the number of energy layers and sequencing them so layer switches only occur from low-to-high energy level [LLZ + 20, EBW + 22].
Another strand of research takes a structured optimization-based approach.For example, Müller and Wilkens (2016) [MW16] directly minimize the sum of spot intensities as part of a prioritized optimization routine.Other authors formulate the proton treatment delivery problem as a mixed-integer program (MIP), where each energy layer [CLL + 14] or path between layers [WSGJ + 22, WZSG + 22] is associated with a binary indicator variable.Their objective is to minimize the dose fidelity (e.g., over/underdose to the target) plus some penalty or constraints on the energy layers, which promote a lower switching time.Although mathematically elegant, these MIPs are computationally difficult to solve, as they scale poorly with the number of energy layers due to the combinatorial nature of the problem.
To avoid this issue, researchers turned their attention to continuous optimization models.A proton treatment planning problem in this category only contains continuous variables, like spot intensities and doses.Typically, the objective includes a regularization function that is selected to encourage sparsity (i.e., more zero elements) in the spots and energy layers.The regularizer applies a penalty to the total spot intensity within each layer group.A variety of options have been proposed for this penalty function: logarithm [vdWKHH15, vdWBA + 20], l 2,1/2 norm [GRL + 20], and l 2 norm [JHP + 18, LCL + 19, GCM + 20, WSGJ + 22].The last of these is of particular interest because it is convex and widely used in statistics for promoting group sparsity; the associated regularizer is known as the group lasso [YL06, MvdGB08, LH15, IPR16].
The standard lasso (i.e., l 1 norm penalty) promotes sparsity of the spot intensities, but does not directly penalize the energy layers.The group lasso (i.e., group l 2 norm penalty) promotes sparsity of the energy layers, but actually increases the number of nonzero spots.Since IMPT delivery time depends on both the number of spots and energy layers [GCM + 20], neither of these regularizers is ideal.In this paper, we propose a new regularization method that simultaneously reduces the number of nonzero spots and energy layers, while upholding treatment plan quality.Our reweighted l 1 method combines the l 1 penalty from standard lasso with a weighting mechanism that differentiates between the spots of different energy layers, similar to the group lasso.We test the proposed method on four head-and-neck cancer patients and demonstrate its ability to 1) reduce the number of spots and energy layers simultaneously, and 2) provide a better trade-off between dosimetric plan quality and plan delivery efficiency than existing regularization methods (i.e., standard l 1 and group lasso).
2 Methods and materials

Problem formulation
We discretize the patient's body into m voxels and the proton beams into n spots.For each spot j ∈ {1, . . ., n}, we calculate the radiation dose delivered by a unit intensity of that spot to voxel i ∈ {1, . . ., m} and call this value A ij .The dose influence matrix is then A ∈ R m×n + , where its rows correspond to the voxels and its columns to the spots.Let p ∈ R m + be the prescription vector, i.e., p i equals the physician-prescribed dose if i is a target voxel and zero otherwise.We concatenate the spot intensities of all beams and all energy layers within each beam, denoting this collective as the vector x ∈ R n .The typical treatment planning problem seeks a vector of spot intensities x that minimizes the deviation of the delivered dose, d = Ax, from the prescription p.This deviation can be decomposed into a penalty on the overdose, d = (d−p) + = max(d−p, 0), and the underdose, d = (d−p) − = − min(d−p, 0), which we combine to form the cost function where w, w ∈ R n + are penalty parameters that determine the relative importance of the over/underdose to the treatment plan.(Note the underdose is ignored for non-target voxels i because p i = 0).Dose constraints are defined for each anatomical structure.For a given structure s, let A s ∈ R ms×n + be the row slice of A containing only the rows of the m s voxels in s.A maximum dose constraint takes the form of A s x ≤ d max s , where d max s is an upper bound.Similarly, a mean dose constraint is of the form 1 ms 1 T A s x ≤ d mean s . By stacking the constraint matrices/vectors for all S structures, we can represent the set of dose constraints as a single linear inequality Bx ≤ c, where Then, our treatment planning problem is (3) (The derivation is provided in appendix A).Problem 3 is a convex quadratic program (QP), hence can be solved using standard convex methods, e.g., the alternating direction method of multipliers (ADMM) [BPC + 11, FTZ23] or interior-point methods [Wri97,Gor22].The reader is referred to S. Boyd and L. Vandenberghe (2004) [BV04] and J. Nocedal and S. J. Wright (2006) [NW06] for a thorough discussion of convex optimization.

Common regularizers
The cost function defined in 1 focuses solely on the difference of the delivered dose from the prescription, i.e., the dosimetric plan quality.However, in our treatment scenario, we are also interested in reducing the dose delivery time, i.e., increasing the plan delivery efficiency.The delivery time is positively correlated with the number of nonzero spots (spot scanning rate) and nonzero energy layers (energy switching time) [GCM + 20, PEL + 18].Thus, we want to augment the objective of problem 3 with a regularization function r : R n → R, which penalizes the spot vector x in a way that reduces the number of nonzero spots/layers, while maintaining high plan quality.The regularized treatment planning problem is with respect to x, d, and d.Here we have introduced a regularization weight λ ≥ 0 to balance the trade-off between dosimetric plan quality, represented by the cost f (d, d), and plan delivery efficiency, as captured by the regularization term r(x).A larger value of λ places more importance on efficiency.
In the following subsections, we review a few regularization functions that have been suggested in the literature.Let J = {1, . . ., n} and G = {J 1 , . . ., J G } be a set of subsets of J , where each J g ⊆ J has exactly n g ≤ n elements.Specifically in our setting, G represents a partition of n spots into G energy layers with J g containing the indices of the n g spots in layer g.

l 0 regularizer
One method of reducing the delivery time is to directly penalize the number of nonzero spots.This can be accomplished via the l 0 regularizer r 0 (x) = ∥x∥ 0 = card({j : which we have defined as the number of nonzero elements in x. (Here card(A) denotes the cardinality of set A). Unfortunately, the l 0 regularization function is computationally expensive to implement.To solve problem 4 with r = r 0 , we would need to solve a series of large mixed-integer programs in order to determine the optimal subset of nonzero spots out of all possible combinations from J [BKM16].As the number of spots n is typically very large (on the order of 10 3 to 10 4 ), this quickly becomes computationally intractable.
Another option is to apply the l 0 regularizer to the energy layers: In this case, r0 returns the number of nonzero energy layers, where a layer g is zero if and only if all its spots are zero, i.e., j∈Jg x j = 0.The associated combinatorial problem or MIP simplifies to finding the optimal subset of nonzero layers, which is more manageable since G is typically on the order of 10 2 .Cao et al. (2014) [CLL + 14] developed an iterative method to solve an approximation of this MIP and were able to reduce the number of proton energies in their IMPT plan, while satisfying certain dosimetric criteria.Nevertheless, as combinatorial optimization is still expensive, we turn our attention to a different regularization function.

l 1 regularizer
A common approximation of the l 0 regularizer is the l 1 norm.Define the l 1 regularization function to be This function is closed, convex, and continuous.When used as a regularizer in problem 4, it produces a convex optimization problem -a form of the lasso problem -that promotes sparsity in the solution vector x [Tib96].The lasso problem is well-studied in the literature [VBL13, HTW15], and various methods have been developed to solve it quickly and efficiently [PH07, SFR07, WL08, SYOS10].One downside of the l 1 regularizer is that it does not differentiate between energy layers and thus is insensitive to the number of layers: since the spot vector is nonnegative, the absolute value of its elements |x j | = x j , and any sum over energy layers decouples into the sum over all spots G g=1 j∈Jg |x j | = n j=1 x j .As energy layer switching time is typically longer than spot delivery or travel time, l 1 regularization is not the most effective method for improving plan delivery efficiency.

Group l 2 regularizer
The group l 2 regularizer, also known as the group lasso, provides an alternative method for efficiently implementing group penalties.This regularization function is defined as In our proton treatment delivery scenario, the group lasso is capable of differentiating between energy layers, and thus is a good regularizer for reducing the number of nonzero layers.However, it also tends to increase the number of nonzero spots within the active layers, as the quadratic penalty term in equation 8 mostly ignores small spot intensities (square of a small x j is near zero).This failure to generate sparsity in the spot vector due to the characteristics of the l 2 norm make it an inadequate regularizer for our purposes.

Reweighted l 1 method
As we have discussed, the l 1 regularization function promotes sparsity of the spots, but not the energy layers.The group l 2 regularization function promotes sparsity of the layers, but not the spots -indeed, it tends to produce dense spot vectors due to the l 2 norm.In this section, we introduce the reweighted l 1 regularization method, which promotes sparsity in both the spots and the energy layers.
The reweighted l 1 method assigns a weight to every spot based on its intensity and energy layer.The weights are chosen to counteract the intensity of each layer, so that all spots, regardless of intensity, contribute roughly equally to the total regularization penalty.This is done to imitate the "ideal" group l 0 regularizer (equation 6), which counts every nonzero layer as one unit (due to card) regardless of intensity.
Formally, we define the weighted group l 1 regularizer with weight parameters β g ∈ R + ∪ {+∞} for g = 1, . . ., G.An intuitive way to set β g is to make it inversely proportional to the optimal total intensity of energy layer g, i.e., , where x ⋆ ∈ R n + is an optimal spot vector.However, we do not know x ⋆ beforehand.We will approximate this weighting scheme iteratively using the reweighted l 1 method.
The reweighted l 1 method is a type of majorization-minimization (MM) algorithm, which solves an optimization problem by iteratively minimizing a surrogate function that majorizes the actual objective function.MM algorithms have a rich history in the literature [FBDN07, SSW10, Mai15, SBP17], and reweighted l 1 in particular has been used to solve problems in portfolio optimization [LFB07], matrix rank minimization [Faz02, FHB03], and sparse signal recovery [CWB08].Research has shown that it is fast and robust, outperforming standard l 1 regularization in a variety of settings.
We now describe the reweighted l 1 method in our treatment planning setting.Let the initial weights 2. Compute the total intensity of each energy layer e (k) 3. Lower threshold the solution where g ′ for some small δ ∈ (0, 1). 4. Update the weights.First, compute the standardized reciprocals Then, calculate the scaling term The new weights are β g .

5.
Terminate on convergence of the objective, or when k reaches a user-defined maximum number of iterations K. Step 3 was introduced to ensure stability of the algorithm, so that a zero energy layer estimate e (k) g = 0 would not preclude the subsequent estimate e (k+1) g from being nonzero.In our computational experiments, we found that a threshold fraction of δ = 0.01 produced good results.
Step 4 was added to ensure the l 1 regularization term (r 1 (x)) and the reweighted l 1 term (r 3 (x)) contribute a similar amount to the objective.To this end, the reweighted l 1 term is scaled by µ The reweighted l 1 method has a number of advantages over the other regularizers we have reviewed here.It encourages sparsity in energy layers by properly grouping the l 1 penalty.It spreads this penalty evenly across all energy layers using its weighting scheme, rather than prioritizing those spots with large magnitudes.Finally, it is easy to implement: each iteration of the algorithm only requires that we solve a simple convex problem with l 1 regularization, which can be done efficiently using many off-the-shelf solvers.Moreover, the number of reweighting iterations needed in practice is typically very low, with most of the improvement coming from the first two to three iterations, so its computational cost is overall low.As we will see in the next section, reweighted l 1 outperforms regular l 1 and group l 2 penalties in sparsifying spots/layers.

Patient population and computational framework
We compared the reweighted l 1 method with standard l 1 and group l 2 regularization on four head-and-neck cancer patient cases from The Cancer Imaging Archive (TCIA) [CVS + 13, BdOCM].For each patient, we created the dose influence matrix using the proton pencil beam calculation engine in the open-source package MatRad [WCW + 17, WWG + 18], assuming a relative biological effectiveness (RBE) of 1.1.The proton spots were situated on a rectangular grid with a spot spacing of 5 mm, and the grid covered the entire PTV plus 1 mm out from its perimeter.The voxel resolution and dose grid resolution were both 0.98 mm × 0.98 mm × 2 mm.Every patient plan was generated using two co-planar beams with beam angles selected using a Bayesian optimization algorithm [THS + 20].Table 1 provides more details.
Each patient had a single planning target volume (PTV) that was prescribed a dose of p = 70 Gy delivered in 35 fractions of 2 Gy per fraction.The mean/max dose bounds for important structures are given in 1, without exhaustive search, to obtain reasonable treatment plans.In the majority of cases, w i = 1, w i = 10 for PTV voxels i and w i = 10 −3 for all other voxels i provided the best results.
We implemented the standard l 1 , group l 2 , and reweighted l 1 based treatment planning methods in Python using CVXPY [DB16, AVDB18, DAM + 22] and solved the associated optimization problems with MOSEK [ApS22].All computational processes were executed on a 64-bit PC with an AMD Ryzen 9 3900X CPU @ 3.80 GHz/ 12 cores and 128 GB RAM.For reweighted l 1 , we ran the algorithm for K = 3 iterations due to the diminishing benefits of more iterations.
To facilitate comparisons, we scaled the group l 2 regularizer so it lay in the same range as the standard l 1 regularizer.First, we solved problem 4 with the standard l 1 regularizer (equation 7) and λ = 1.Let us call this solution x (1) .Then, we computed a scaling term η > 0 such that ηr 2 (x (1) ) = r 1 (x (1) ).When we ran the group l 2 method, we used the scaled regularization function r2 (x; η) := ηr 2 (x) as the regularizer r(x) in problem 4.This allowed us to obtain better spot/energy layer comparison plots between the standard l 1 and group l 2 methods.As pointed out earlier, the reweighted l 1 method is similarly scaled via step 4 of the algorithm.
After each method finished, we trimmed the optimal spot vector x ⋆ further to increase sparsity.First, we zeroed out all elements x ⋆ j that fell below a fraction γ ∈ (0, 1) of the maximum spot intensity, i.e., we set x ⋆ j = 0 if x ⋆ j < γ max j ′ x ⋆ j ′ .We then zeroed out all energy layers of the resulting x⋆ that fell below the same fraction of the maximum layer intensity: for each g ∈ {1, . . ., G}, we set x⋆ A choice of γ = 0.01 provided a reasonable trade-off between sparsity and dose coverage in our computational experiments.

Simultaneous reduction of spots and energy layers
We first compared the results of the regularization methods on a single patient.Figure 1 depicts optimal spot intensities of the unregularized model and the l 1 , group l 2 , and reweighted l 1 regularized models for patient 2. For all three regularizers, a regularization weight of λ = 5 was used; this choice accentuated the difference between their spot vectors.Without regularization, about one third of the total 7011 spots are nonzero, with individual spot intensities ranging between 10 3 and 10 4 .Under l 1 regularization, that fraction is reduced to only 13%, or 918 nonzero spots, as the l 1 penalty encourages further sparsity.By contrast, with group l 2 regularization, the number of active spots increases significantly to 5099 or 72%, while the average intensity drops to a little over 10 3 .Reweighted l 1 regularization produced a spot vector with the lowest number of nonzero elements: just 541 or 7.7% of the spots are nonzero.For patient 2, these active spots tend to reside in the first beam, and their maximum intensity exceeds that of the other methods.
Figure 2 shows the optimal intensity of the energy layers for patient 2, using the three regularization models, all with λ = 5.Over 95% of the total 70 energy layers are nonzero under the unregularized model.These active layers are divided fairly evenly into two clusters, which coincide with the two beams delineated by the vertical red line.The total intensity of each energy layer averages between 10 4 and 10 5 .With l 1 regularization, the fraction of nonzero energy layers drops to a modest 80%, where most of that reduction comes from deactivated layers at the edges of the clusters.Group l 2 regularization results in a steeper drop in the fraction of active energy layers, down to 61% with additional sparsity in the middle of both beam clusters.However, the reweighted l 1 method performs better than both of these methods, cutting the number of nonzero energy layers down to only 18 -a reduction of over 75% -with a commensurate increase in the intensity of the active layers.
A summary of the results from the different regularization methods is given in Figure 3.For a fixed λ, it is clear that reweighted l 1 achieves the lowest number of nonzero spots and nonzero energy layers out of all the methods.Unregularized

Trade-off between delivery efficiency and PTV coverage
The regularization weight in the previous section was chosen to highlight the distinctions between the optimal intensity plots.However, λ must be carefully selected to balance the tradeoff between the total delivery time (highly correlated with the sparsity of the spots/energy layers) and the quality of the resulting treatment plan.Figure 4 examines this trade-off for reweighted l 1 regularization on patient 2 using two measures of plan quality: D98% and D2% for the PTV.For different values of λ, we solved problem 4 using the reweighted l 1 method, counted up the number of nonzero spots/energy layers, and calculated the optimal dose vector and PTV dose percentiles.We then plotted a point corresponding to this result in each of the subfigures of Figure 4 with the sparsity metric on the vertical axis and the plan quality metric on the horizontal axis (e.g., the unregularized point λ = 0 is marked by a triangle △).By connecting the points in each subfigure, we obtained a set of Pareto optimal curves, which show the trade-off between plan delivery efficiency and plan quality.
The top left subfigure depicts the number of nonzero spots versus D98% to the PTV for λ ranging from zero to 6.0 (marked by the square).As λ increases, the number of active spots decreases, but so does D98%.A choice of λ = 0.95 (marked by the star) achieves the lowest number of nonzero spots ≈ 720, while still maintaining D98% above 95% of the prescription, indicated by the vertical gray dotted line at 66.5 Gy.A similar plot can be seen in the bottom left subfigure, which shows the number of nonzero energy layers versus D98% to the PTV; the same choice of λ yields 27 active layers.On the righthand side, the subfigures display the number of nonzero spots (top) and energy layers (bottom) versus D2% to the PTV.As the regularization weight increases, D2% also increases, but never exceeds 108% of the prescription (as indicated by the dotted line at 75.6 Gy) for any λ ≤ 0.95.Thus, out of all the weights, λ = 0.95 yields a good trade-off between sparsity and PTV coverage: it achieves a reduction of 89% and 61% in the number of spots and energy layers, respectively, while still fulfilling all target dose constraints.) for various values of the regularization weight λ, computed using the reweighted l 1 method on patient 2. As λ increases, the curves sweep from the △ marker (λ = 0.0) to the ■ marker (λ = 6.0).The vertical gray dotted lines indicate clinical dose constraints on the PTV (D98% > 0.95p and D2% < 1.08p, where p = 70 Gy is the prescription), and the red arrows indicate the directions of desirable change (increasing D98%, decreasing D2%, and decreasing number of nonzero spots/energy layers).A choice of weight λ = 0.95, marked by the ⋆, produces a plan with good sparsity that respects the clinical constraints.

Trade-off between delivery efficiency and overall plan quality
This section studies the Pareto optimal trade-off curves between spot/energy layer sparsity and treatment plan quality using different regularizers to determine which regularization method provides the best trade-off, i.e., the largest increase in sparsity for the least decrease in plan quality.Rather than plotting multiple dose-volume metrics, we focus on a single consolidated quality measure: the plan cost function (equation 1), which includes dose fidelity terms for the PTV and all organs-at-risk (OARs).A lower value of f (d, d) at the optimum implies a higher quality treatment plan.
To facilitate comparison, we also focus on the relative change in sparsity (number of nonzero spots/energy layers) and plan cost with respect to the unregularized solution.Let x unreg be the optimal spots resulting from the unregularized problem 3, and x reg be the optimal spots resulting from a particular regularization method.We evaluate the plan cost function given in equation 1 on x unreg (with d unreg = max(Ax unreg − p, 0) and d unreg = − min(Ax unreg − p, 0)) to obtain c unreg , and similarly on x reg to obtain c reg .Then, the relative percentage change in plan cost for the regularizer is 100(c reg − c unreg )/c unreg .The relative change in the number of nonzero spots (s) and number of nonzero energy layers (l) is defined in a similar fashion as 100(s reg − s unreg )/s unreg and 100(l reg − l unreg )/l unreg , respectively.Thus, to construct the trade-off curve, we solve the regularized problem for various values of λ and plot the relative change in sparsity versus the relative change in plan cost at each solution point.
For every patient, Figure 5 depicts the relative percentage change in the number of nonzero spots versus the relative percentage change in plan cost for the l 1 , group l 2 , and reweighted l 1 regularization methods.The origin corresponds to the unregularized plan (λ = 0).Both the l 1 and reweighted l 1 trade-off curves drop sharply from the origin, attaining on average a 30% to 45% decrease in nonzero spots for a less than 10% increase in plan cost, with reweighted l 1 slightly outperforming l 1 by on average 5 percentage points over all four patients.By contrast, the number of nonzero spots rises with group l 2 regularization, increasing up to 140% within the first 10% to 15% increase in plan cost for all except patient 4.This is consistent with our spot intensity plot for patient 2 (Figure 1), which shows the spot distribution is denser under group l 2 than without regularization.
Figure 6 depicts the relative percentage change in the number of nonzero energy layers versus the relative percentage change in plan cost for the three regularization methods.Both l 1 and group l 2 trade-off curves decrease moderately from the origin, with group l 2 averaging about 9.5% lower number of nonzero layers for a given percentage increase in cost.This matches our observations in Figures 2 and 3 that the group l 2 function is more effective at penalizing energy layers than the l 1 norm.
However, the reweighted l 1 method significantly outperforms both these regularizers.For patient 2, it achieves an over 50% decrease in the number of nonzero energy layers for a less than 10% increase in plan cost.For the other patients, it provides a 25% to 35% reduction in active energy layers with a less than 15% increment in plan cost.The average reduction in the number of nonzero layers from reweighted l 1 exceeds the best reduction from group l 2 by 12 percentage points, and the majority of this reduction is realized with only about 10% cost to treatment plan quality, relative to the unregularized plan.
The vertical dotted lines in Figures 5 and 6 for patient 2 correspond to a 10% increase in the plan cost.The intersection of these lines with the Pareto curves of different regu-larization methods demonstrates the reduction in the number of nonzero spots and energy layers obtained using different regularizers.Figure 7 (top right) depicts the dose-volume histogram (DVH) curves of the unregularized plan, the l 1 regularized plan, and the reweighted l 1 regularized plan at the same 10% relative change in plan cost.(We chose not to show the DVHs of group l 2 regularized plans because they overlap considerably with the DVHs of corresponding l 1 regularized plans, and as we saw earlier, group l 2 results in far worse delivery efficiency than l 1 or reweighted l 1 regularization).Compared to no regularization, the reweighted l 1 method reduces the number of active spots and energy layers by more than 50%, while providing relatively similar DVH curves with different trade-offs (compromised left parotid and PTV coverage/homogeneity, and improved right parotid and mandible).One can re-adjust the PTV/OAR weights in the plan cost function of the reweighted l 1 problem to achieve more uniform trade-offs.In the same vein, compared to standard l 1 regularization, reweighted l 1 reduces the number of active spots and energy layers by about 10% and 40%, respectively, while producing almost identical DVH curves.The Pareto curves and DVHs of the other patients, also plotted at roughly 10% relative change in plan cost, tell a similar story.

Relative improvement in delivery time
The total plan delivery time is dependent on a multitude of machine-specific factors.In this section, we provide an estimate of how regularization directly impacts the delivery time, assuming a specific set of machine parameters.The total delivery time (T ) can be approximated by protons/second.We compute T using the unregularized model and the l 1 , group l 2 , and reweighted l 1 regularized models, with regularization weight λ chosen such that each regularized plan achieves a 10% cost to plan quality.Then, we plot the relative percentage change in delivery time of each regularized model with respect to the unregularized model (i.e., 100(T reg − T unreg )/T unreg ).
The results for patient 2 are shown in Figure 8.Standard l 1 reduces delivery time by a modest 13%, while group l 2 actually raises delivery time by 2% due to the 149% increase in the number of nonzero spots, which increases the spot delivery and spot travel time.By contrast, reweighted l 1 achieves a 44% reduction in delivery time through its simultaneous reduction of the number of nonzero spots and nonzero energy layers by 57% and 51%, respectively.This reduction comes at only a minor cost to the PTV and mandible -D2% to the PTV goes up by 4% and maximum dose to the mandible goes up by 2%.Results for the other patients reflect similar outcomes, with reweighted l 1 reducing total delivery time by between 20% and 30%; see Table 3  Figure 8: Relative change in delivery time and other plan metrics with respect to the unregularized model for l 1 , group l 2 , and reweighted l 1 regularization for patient 2. For each regularizer, λ was chosen such that the regularized model resulted in about 10% cost to plan quality.Here the conformity index (CI) is defined as the total number of voxels that received at least 95% of the prescribed dose, divided by the number of voxels in the PTV [PSL17].

Discussion
This study proposed a method to improve the delivery efficiency of pencil beam scanning proton plans by simultaneously reducing the number of spots and energy layers using reweighted l 1 regularization.One can exactly model the spot/energy layer reduction problem using the l 0 regularizer, which in principle would improve plan delivery at the smallest possible cost to plan quality, but the l 0 -regularized optimization problem is nonconvex and computationally prohibitive to solve.In imaging science and statistics, researchers often employ the l 1 norm as a convex surrogate for the l 0 norm, and in some cases (e.g., compressed sensing), the l 1 norm has proven to be just as effective as the l 0 norm at promoting sparsity [CRT06].The reweighted l 1 regularization method was proposed [CWB08] to bridge the gap between the l 0 regularizer and the l 1 regularizer by better approximating the l 0 norm, while retaining the convexity of the l 1 norm.In proton treatment planning, this property translates to improving the plan delivery efficiency at a lower cost to plan quality, which we have demonstrated in this work.Our limited computational experiments on four head-and-neck cancer patients show that, for the same cost to plan quality, the reweighted l 1 method reduced the number of nonzero spots by up to 10 percentage points more than standard l 1 and the number of nonzero energy layers by 25 to 30 percentage points more than group l 2 regularization.Promoting spot/energy layer sparsity to improve plan delivery in IMPT is analogous to promoting beam profile smoothness to improve plan delivery in IMRT.Prior research has shown that plan delivery efficiency in IMRT can be significantly improved at minimal cost to dosimetric plan quality due to the phenomenon of degeneracy [AMNR02, LASP04].The structure of the treatment planning problem results in a multitude of feasible plans with near-equal objective value (i.e., quality).This same phenomenon has been observed in IMPT planning problems [vdWBA + 20, vdWKHH15, GCM + 20, LCL + 19], although unlike IMRT, it currently lacks a rigorous mathematical analysis.Our computational experiments demonstrated that with the reweighted l 1 method, one can reduce the number of spots and energy layers by on average 40% and 35%, respectively, without significantly compromising the dosimetric plan quality.
In this study, we have adopted a constrained optimization framework, where the dosimetric plan quality is represented by a quadratic term in the objective, the sparsity promotion is carried out via a regularization penalty term, and the mean/max clinical dose criteria are enforced by hard constraints.However, the proposed reweighted l 1 method is agnostic to the optimization framework and can also be used in conjunction with an automation tool (e.g., hierarchical optimization [ZHZ + 22, BSH09, THDZ20], multiple criteria optimization (MCO) [CB08,MKBT08], knowledge-based planning (KBP) [AMT + 12, SNZ + 20]).DVH constraints and plan robustness may be integrated into the optimization problem using existing techniques in the literature [FUXB19, ZST + 13, MHDZ20, FTZ23, THDZ20].To limit the scope of this paper, we have not incorporated robustness into our formulation of the treatment planning problem.A preliminary robustness analysis of 13 range and setup uncertainty scenarios shows that the reweighted l 1 method generates plans with a similar level of robustness to the unregularized model's plans.Moreover, even without accounting for these uncertainties in the optimization, the DVH curves and clinical metrics of the plans lie within an acceptable range across the majority of scenarios.Appendix B.1 provides details on our analysis and the resulting figures.
Finally, we mention that in this proof-of-concept work, we have not enforced the machinespecific minimum-monitor-unit (min-MU) constraint.One can enforce the min-MU constraint using a two-step optimization method as described in Lin et al. (2019) [LCL + 19]: the first step identifies the active spots/energy layers (i.e., those with intensity greater than a pre-determined value), and the second step removes the inactive spots/energy layers and enforces the min-MU constraint on the remaining spots.Increasing the min-MU threshold also allows for a higher dose rate, which can accelerate the delivery of each spot.This is especially important because the intensities of the active spots usually increase with the overall sparsity of the spots/energy layers in the treatment plan.Gao et al. (2020) [GCM + 20] suggested using different min-MU thresholds for each energy layer to further increase the dose rate and expedite spot delivery.

Conclusion
The reweighted l 1 regularization method is capable of simultaneously reducing the number of spots and energy layers in a proton treatment plan, while imposing minimal cost to dosimetric plan quality.Moreover, it achieves a better trade-off between delivery efficiency and plan quality than standard l 1 and group l 2 regularization.Thus, reweighted l 1 regularization is a powerful method for improving the delivery of proton therapy.

A Derivation of problem 3
Problem 2 in the main text is a nonconvex optimization problem because the equality constraints d = (Ax − p) + and d = (Ax − p) − are nonlinear.We will show that it is equivalent to the convex optimization problem 3, which replaces the nonlinear equality constraints with the linear equality constraint Ax − d + d = p.Since the objective functions are identical, it is sufficient to prove that any solution of problem 2 is a feasible point for problem 3 and vice versa.Let v ⋆ := (x ⋆ , d ⋆ , d ⋆ ) be a solution of problem 2.Then, because any real function can be written as the sum of its nonnegative and nonpositive parts.Thus, v ⋆ is feasible for problem 3, and since the two problems have the same objective function, v ⋆ is a solution of problem 3. Now let us show the converse.First, we will relax the equality constraints in problem 2 to get minimize This relaxed problem is equivalent to problem 2 because the relaxed constraints, d ≥ (Ax − p) + and d ≥ (Ax − p) − , are tight at the optimum, since the objective function 1 is monotonically increasing over the nonnegative reals.Any feasible point (x, d, d) of problem 14 that does not satisfy d = (Ax − p) + and d = (Ax − p) − cannot be optimal, as we can perturb it to obtain another feasible point with a strictly lower objective value.For example, suppose di > min(Ax − p, 0) i for a voxel i ∈ {1, . . ., m}, then we can decrease di by some small quantity δ > 0 to get d := d − δe i ≥ 0 that still satisfies the inequality constraint.(Here e i ∈ {0, 1} n is the indicator vector of index i, which equals 1 at i and 0 elsewhere).The new point (x, d, d) is feasible for problem 14, but attains a lower objective f ( d, d) < f ( d, d) because the quadratic term w i d2 i < w i d2 i by our choice of δ > 0, assuming w i ̸ = 0.If w i = 0, the underdose to voxel i has no impact on the objective and can be removed entirely from the optimization problem.
It suffices then to show that any solution v Thus, we can conclude that

B Delivery efficiency and plan quality metrics
In this section, we present further results on the delivery time and other clinical metrics.Figure 9 is an extension of Figure 8 in the main text, depicting the relative percentage change in total delivery time, number of nonzero spots/energy layers, and several dosimetric quantities at 10% cost to plan quality for patients 1, 3, and 4. Table 3: Values of various metrics under the unregularized plan and the l 1 , group l 2 , and reweighted l 1 regularized plans.For each patient and regularizer, λ was chosen such that the regularized plan resulted in a 10% cost to plan quality relative to the unregularized plan (e.g., as shown by the dotted line in Figure 5).

Figure 1 :
Figure 1: Optimal spot intensities resulting from the unregularized model and the l 1 , group l 2 , and reweighted l 1 regularized models (λ = 5) for patient 2. The vertical red line divides the spots associated with the first beam (1-3516) from the second beam (3517-7011).

Figure 2 :Figure 3 :
Figure 2: Sum of spot intensities in each energy layer (1-70) for the unregularized model and the l 1 , group l 2 , and reweighted l 1 regularized models (λ = 5) for patient 2. The vertical red line divides the layers associated with the first beam (1-33) from the second beam (34-70).

Figure 4 :
Figure4: Number of nonzero spots (top) and energy layers (bottom) versus PTV dose percentile (left: D98%; right: D2%) for various values of the regularization weight λ, computed using the reweighted l 1 method on patient 2. As λ increases, the curves sweep from the △ marker (λ = 0.0) to the ■ marker (λ = 6.0).The vertical gray dotted lines indicate clinical dose constraints on the PTV (D98% > 0.95p and D2% < 1.08p, where p = 70 Gy is the prescription), and the red arrows indicate the directions of desirable change (increasing D98%, decreasing D2%, and decreasing number of nonzero spots/energy layers).A choice of weight λ = 0.95, marked by the ⋆, produces a plan with good sparsity that respects the clinical constraints.

Figure 5 :Figure 6 :Figure 7 :
Figure 5: Relative change in number of nonzero spots versus relative cost to plan quality with respect to the unregularized model.For patient 2, reweighted l 1 regularization achieves a 57% reduction in the number of nonzero spots at only a 10% cost to overall plan quality, relative to the unregularized model, as indicated by the vertical gray dotted line.
where h b (x) is the number of nonzero beams, h e (x) is the number of nonzero energy layers, h s (x, I g ) is the number of nonzero spots in energy layer g, T b is the beam switching time (gantry rotation plus beam setup time), T e is the energy layer switching time, T s is the spot travel time, and T d is the proton dose rate [GCM + 20, ZSL + 22].Following van de Water et al. (2015) [vdWKHH15], we let T b = 30 seconds, T e = 2 seconds, T s = 0.01 seconds, and T d = 4×10 11 60

Figure 10 :Figure 11 :
Figure 10: DVH bands across all uncertainty scenarios obtained from the unregularized plan and the reweighted l 1 regularized plan for patient 1.The solid lines indicate the DVH curves of the nominal scenario.

Figure 12 :Figure 13 :
Figure 12: DVH bands across all uncertainty scenarios obtained from the unregularized plan and the reweighted l 1 regularized plan for patient 3. The solid lines indicate the DVH curves of the nominal scenario.

Table 2 .
We manually adjusted the weights in cost function

Table 2 :
Dose constraints for each structure.N/A means the max/mean dose was unbounded, i.e., d max = +∞ or d mean = +∞.
and Figure9in appendix B for additional data and plots.
sity: The Lasso and Generalizations.Monographs on Statistics and Applied Probability.Taylor and Francis, 2015.
[PSL17]rovides the absolute values for the metrics plotted in the two figures.Together, they show that reweighted l 1 regularization outperforms l 1 and group l 2 regularization in all patients.Figure 9: Relative change in delivery time and other plan metrics with respect to the unregularized model for l 1 , group l 2 , and reweighted l 1 regularization.For each regularizer, λ was chosen such that the regularized model resulted in about 10% cost to plan quality.Here the conformity index (CI) is defined as the total number of voxels that received at least 95% of the prescribed dose, divided by the number of voxels in the PTV[PSL17].Group l 2 Reweighted l 1